Expand description
Warning: This crate is experimental. It relies on implementation techniques that are hard to keep working for 100% of configurations. It may work fine for you, or it may crash, hang, or otherwise do the wrong thing. Its maintenance is not a high priority of the author. Support requests such as issues and pull requests may receive slow responses, or no response at all. Sorry!
This crate provides heap profiling and ad hoc profiling capabilities to Rust programs, similar to those provided by DHAT.
The heap profiling works by using a global allocator that wraps the system
allocator, tracks all heap allocations, and on program exit writes data to
file so it can be viewed with DHAT’s viewer. This corresponds to DHAT’s
--mode=heap
mode.
The ad hoc profiling is via a second mode of operation, where ad hoc events
can be manually inserted into a Rust program for aggregation and viewing.
This corresponds to DHAT’s --mode=ad-hoc
mode.
dhat
also supports heap usage testing, where you can write tests and
then check that they allocated as much heap memory as you expected. This
can be useful for performance-sensitive code.
§Motivation
DHAT is a powerful heap profiler that comes with Valgrind. This crate is a related but alternative choice for heap profiling Rust programs. DHAT and this crate have the following differences.
- This crate works on any platform, while DHAT only works on some platforms (Linux, mostly). (Note that DHAT’s viewer is just HTML+JS+CSS and should work in any modern web browser on any platform.)
- This crate typically causes a smaller slowdown than DHAT.
- This crate requires some modifications to a program’s source code and recompilation, while DHAT does not.
- This crate cannot track memory accesses the way DHAT does, because it does not instrument all memory loads and stores.
- This crate does not provide profiling of copy functions such as
memcpy
andstrcpy
, unlike DHAT. - The backtraces produced by this crate may be better than those produced by DHAT.
- DHAT measures a program’s entire execution, but this crate only measures
what happens within
main
. It will miss the small number of allocations that occur before or aftermain
, within the Rust runtime. - This crate enables heap usage testing.
§Configuration (profiling and testing)
In your Cargo.toml
file, as well as specifying dhat
as a dependency,
you should (a) enable source line debug info, and (b) create a feature or
two that lets you easily switch profiling on and off:
[profile.release]
debug = 1
[features]
dhat-heap = [] # if you are doing heap profiling
dhat-ad-hoc = [] # if you are doing ad hoc profiling
You should only use dhat
in release builds. Debug builds are too slow to
be useful.
§Setup (heap profiling)
For heap profiling, enable the global allocator by adding this code to your program:
#[cfg(feature = "dhat-heap")]
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;
Then add the following code to the very start of your main
function:
#[cfg(feature = "dhat-heap")]
let _profiler = dhat::Profiler::new_heap();
Then run this command to enable heap profiling during the lifetime of the
Profiler
instance:
cargo run --features dhat-heap
dhat::Alloc
is slower than the normal allocator, so it should
only be enabled while profiling.
§Setup (ad hoc profiling)
Ad hoc profiling involves manually annotating hot code points and then aggregating the executed annotations in some fashion.
To do this, add the following code to the very start of your main
function:
#[cfg(feature = "dhat-ad-hoc")]
let _profiler = dhat::Profiler::new_ad_hoc();
Then insert calls like this at points of interest:
#[cfg(feature = "dhat-ad-hoc")]
dhat::ad_hoc_event(100);
Then run this command to enable ad hoc profiling during the lifetime of the
Profiler
instance:
cargo run --features dhat-ad-hoc
For example, imagine you have a hot function that is called from many call
sites. You might want to know how often it is called and which other
functions called it the most. In that case, you would add an
ad_hoc_event
call to that function, and the data collected by this
crate and viewed with DHAT’s viewer would show you exactly what you want to
know.
The meaning of the integer argument to ad_hoc_event
will depend on
exactly what you are measuring. If there is no meaningful weight to give to
an event, you can just use 1
.
§Running
For both heap profiling and ad hoc profiling, the program will run more slowly than normal. The exact slowdown is hard to predict because it depends greatly on the program being profiled, but it can be large. (Even more so on Windows, because backtrace gathering can be drastically slower on Windows than on other platforms.)
When the Profiler
is dropped at the end of main
, some basic
information will be printed to stderr
. For heap profiling it will look
like the following.
dhat: Total: 1,256 bytes in 6 blocks
dhat: At t-gmax: 1,256 bytes in 6 blocks
dhat: At t-end: 1,256 bytes in 6 blocks
dhat: The data has been saved to dhat-heap.json, and is viewable with dhat/dh_view.html
(“Blocks” is a synonym for “allocations”.)
For ad hoc profiling it will look like the following.
dhat: Total: 141 units in 11 events
dhat: The data has been saved to dhat-ad-hoc.json, and is viewable with dhat/dh_view.html
A file called dhat-heap.json
(for heap profiling) or dhat-ad-hoc.json
(for ad hoc profiling) will be written. It can be viewed in DHAT’s viewer.
If you don’t see this output, it may be because your program called
std::process::exit
, which exits a program without running any
destructors. To work around this, explicitly call drop
on the
Profiler
value just before exiting.
When doing heap profiling, if you unexpectedly see zero allocations in the
output it may be because you forgot to set dhat::Alloc
as the
global allocator.
When doing heap profiling it is recommended that the lifetime of the
Profiler
value cover all of main
. But it is still possible for
allocations and deallocations to occur outside of its lifetime. Such cases
are handled in the following ways.
- Allocated before, untouched within: ignored.
- Allocated before, freed within: ignored.
- Allocated before, reallocated within: treated like a new allocation within.
- Allocated after: ignored.
These cases are not ideal, but it is impossible to do better. dhat
deliberately provides no way to reset the heap profiling state mid-run
precisely because it leaves open the possibility of many such occurrences.
§Viewing
Open a copy of DHAT’s viewer, version 3.17 or later. There are two ways to do this.
- Easier: Use the online version.
- Harder: Clone the Valgrind repository with
git clone git://sourceware.org/git/valgrind.git
and opendhat/dh_view.html
. There is no need to build any code in this repository.
Then click on the “Load…” button to load dhat-heap.json
or
dhat-ad-hoc.json
.
DHAT’s viewer shows a tree with nodes that look like this.
PP 1.1/2 {
Total: 1,024 bytes (98.46%, 14,422,535.21/s) in 1 blocks (50%, 14,084.51/s), avg size 1,024 bytes, avg lifetime 35 µs (49.3% of program duration)
Max: 1,024 bytes in 1 blocks, avg size 1,024 bytes
At t-gmax: 1,024 bytes (98.46%) in 1 blocks (50%), avg size 1,024 bytes
At t-end: 1,024 bytes (100%) in 1 blocks (100%), avg size 1,024 bytes
Allocated at {
#1: 0x10ae8441b: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc/src/alloc.rs:226:9)
#2: 0x10ae8441b: alloc::raw_vec::RawVec<T,A>::allocate_in (alloc/src/raw_vec.rs:207:45)
#3: 0x10ae8441b: alloc::raw_vec::RawVec<T,A>::with_capacity_in (alloc/src/raw_vec.rs:146:9)
#4: 0x10ae8441b: alloc::vec::Vec<T,A>::with_capacity_in (src/vec/mod.rs:609:20)
#5: 0x10ae8441b: alloc::vec::Vec<T>::with_capacity (src/vec/mod.rs:470:9)
#6: 0x10ae8441b: std::io::buffered::bufwriter::BufWriter<W>::with_capacity (io/buffered/bufwriter.rs:115:33)
#7: 0x10ae8441b: std::io::buffered::linewriter::LineWriter<W>::with_capacity (io/buffered/linewriter.rs:109:29)
#8: 0x10ae8441b: std::io::buffered::linewriter::LineWriter<W>::new (io/buffered/linewriter.rs:89:9)
#9: 0x10ae8441b: std::io::stdio::stdout::{{closure}} (src/io/stdio.rs:680:58)
#10: 0x10ae8441b: std::lazy::SyncOnceCell<T>::get_or_init_pin::{{closure}} (std/src/lazy.rs:375:25)
#11: 0x10ae8441b: std::sync::once::Once::call_once_force::{{closure}} (src/sync/once.rs:320:40)
#12: 0x10aea564c: std::sync::once::Once::call_inner (src/sync/once.rs:419:21)
#13: 0x10ae81b1b: std::sync::once::Once::call_once_force (src/sync/once.rs:320:9)
#14: 0x10ae81b1b: std::lazy::SyncOnceCell<T>::get_or_init_pin (std/src/lazy.rs:374:9)
#15: 0x10ae81b1b: std::io::stdio::stdout (src/io/stdio.rs:679:16)
#16: 0x10ae81b1b: std::io::stdio::print_to (src/io/stdio.rs:1196:21)
#17: 0x10ae81b1b: std::io::stdio::_print (src/io/stdio.rs:1209:5)
#18: 0x10ae2fe20: dhatter::main (dhatter/src/main.rs:8:5)
}
}
Full details about the output are in the DHAT documentation. Note that DHAT uses the word “block” as a synonym for “allocation”.
When heap profiling, this crate doesn’t track memory accesses (unlike DHAT) and so the “reads” and “writes” measurements are not shown within DHAT’s viewer, and “sort metric” views involving reads, writes, or accesses are not available.
The backtraces produced by this crate are trimmed to reduce output file sizes and improve readability in DHAT’s viewer, in the following ways.
- Only one allocation-related frame will be shown at the top of the
backtrace. That frame may be a function within
alloc::alloc
, a function within this crate, or a global allocation function like__rg_alloc
. - Common frames at the bottom of all backtraces, below
main
, are omitted.
Backtrace trimming is inexact and if the above heuristics fail more frames
will be shown. ProfilerBuilder::trim_backtraces
allows (approximate)
control of how deep backtraces will be.
§Heap usage testing
dhat
lets you write tests that check that a certain piece of code does a
certain amount of heap allocation when it runs. This is sometimes called
“high water mark” testing. Sometimes it is precise (e.g. “this code should
do exactly 96 allocations” or “this code should free all allocations before
finishing”) and sometimes it is less precise (e.g. “the peak heap usage of
this code should be less than 10 MiB”).
These tests are somewhat fragile, because heap profiling involves global state (allocation stats), which introduces complications.
dhat
will panic if more than oneProfiler
is running at a time, but Rust tests run in parallel by default. So parallel running of heap usage tests must be prevented.- If you use something like the
serial_test
crate to run heap usage tests in serial, Rust’s test runner code by default still runs in parallel with those tests, and it allocates memory. These allocations will be counted by theProfiler
as if they are part of the test, which will likely cause test failures.
Therefore, the best approach is to put each heap usage test in its own
integration test file. Each integration test runs in its own process, and
so cannot interfere with any other test. Also, if there is only one test in
an integration test file, Rust’s test runner code does not use any
parallelism, and so will not interfere with the test. If you do this, a
simple cargo test
will work as expected.
Alternatively, if you really want multiple heap usage tests in a single integration test file you can write your own custom test harness, which is simpler than it sounds.
But integration tests have some limits. For example, they only be used to test items from libraries, not binaries. One way to get around this is to restructure things so that most of the functionality is in a library, and the binary is a thin wrapper around the library.
Failing that, a blunt fallback is to run cargo tests -- --test-threads=1
.
This disables all parallelism in tests, avoiding all the problems. This
allows the use of unit tests and multiples tests per integration test file,
at the cost of a non-standard invocation and slower test execution.
With all that in mind, configuration of Cargo.toml
is much the same as
for the profiling use case.
Here is an example showing what is possible. This code would go in an
integration test within a crate’s tests/
directory:
#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;
#[test]
fn test() {
let _profiler = dhat::Profiler::builder().testing().build();
let _v1 = vec![1, 2, 3, 4];
let v2 = vec![5, 6, 7, 8];
drop(v2);
let v3 = vec![9, 10, 11, 12];
drop(v3);
let stats = dhat::HeapStats::get();
// Three allocations were done in total.
dhat::assert_eq!(stats.total_blocks, 3);
dhat::assert_eq!(stats.total_bytes, 48);
// At the point of peak heap size, two allocations totalling 32 bytes existed.
dhat::assert_eq!(stats.max_blocks, 2);
dhat::assert_eq!(stats.max_bytes, 32);
// Now a single allocation remains alive.
dhat::assert_eq!(stats.curr_blocks, 1);
dhat::assert_eq!(stats.curr_bytes, 16);
}
The testing
call puts the profiler into
testing mode, which allows the stats provided by HeapStats::get
to be
checked with dhat::assert!
and similar assertions. These
assertions work much the same as normal assertions, except that if any of
them fail a heap profile will be saved.
When viewing the heap profile after a test failure, the best choice of sort metric in the viewer will depend on which stat was involved in the assertion failure.
total_blocks
: “Total (blocks)”total_bytes
: “Total (bytes)”max_blocks
ormax_bytes
: “At t-gmax (bytes)”curr_blocks
orcurr_bytes
: “At t-end (bytes)”
This should give you a good understanding of why the assertion failed.
Note: if you try this example test it may work in a debug build but fail in
a release build. This is because the compiler may optimize away some of the
allocations that are unused. This is a common problem for contrived
examples but less common for real tests. The unstable
std::hint::black_box
function may also be helpful
in this situation.
§Ad hoc usage testing
Ad hoc usage testing is also possible. It can be used to ensure certain
code points in your program are hit a particular number of times during
execution. It works in much the same way as heap usage testing, but
ProfilerBuilder::ad_hoc
must be specified, AdHocStats::get
is
used instead of HeapStats::get
, and there is no possibility of Rust’s
test runner code interfering with the tests.
Macros§
- Asserts that an expression is true.
- Asserts that two expressions are equal.
- Asserts that two expressions are not equal.
Structs§
- Stats from ad hoc profiling.
- A global allocator that tracks allocations and deallocations on behalf of the
Profiler
type. - Stats from heap profiling.
- A type whose lifetime dictates the start and end of profiling.
Functions§
- Registers an event during ad hoc profiling.